In this vignette, we explore how to label your track files (activity and pressure) and provide tips to make the exercise more efficient. To see where this exercise fits in with the overall process, see the vignette How to use GeoPressureR.
library(GeoPressureR)
library(raster)
library(plotly)
library(RColorBrewer)
pam_data = pam_read(pathname = system.file("extdata", package = "GeoPressureR"),
crop_start = "2017-06-20", crop_end = "2018-05-02")Motivation
The most important reason motivating manual editing is that pressure mapping relies on precise activity and pressure data. Activity labeling defines stationary periods and flight duration. Short stationary periods can be particularly hard to define, such that expert knowledge is essential. Since flight duration is the key input in the movement model, having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods. The pressure timeseries matching algorithm is highly sensitive to erroneously labeled pressure, such that even a few mislabeled datapoints can throw off the estimation map.
Each species’ migration behaviour is so specific that manual editing remains the fastest option. You can expect to spend between 30sec (e.g. Mangrove Kingfisher) to 10min (e.g. Eurasian Nightjar) per track depending on the species’ migrating complexity.
Manual editing also provides a sense of what the bird is doing. You will learn how the bird is moving (e.g. long continuous high altitude flight, short flights over multiple days, alternation between short migration flights and stopovers, etc.). It also provides a sense of the uncertainty of your classification, which is useful to understand and interpret your results.
That being said, it is still worth starting the manual editing from an automatically labeled timeseries. pam_classify() defines migratory flight when activity is hight for a long period. Refer to possible classification methods on the PALMr manual.
Basic labeling principles
The procedure involves labeling (1) migratory activity as 1 and (2) identifying pressure datapoints to be discarded from the matching exercise with 1.
The outcome of the activity labeling is twofold:
- defined stationary periods, during which the bird is considered static relative to the size of the grid (~10-30km). The start and end of the stationary period is then used to define the pressure timeseries to be matched.
- defined flight durations, which is used in the movement model to define the distance between stationary periods.
Labeling of pressure allows to deals with situation when the bird is changing altitude. Indeed, since the reanalysis data to be match with is provided at ground level, we want the pressure timeserie of the geolocator to be at a single elevation and must hence discard any datapoint from a different altitude.
Introduction to TRAINSET
We are suggesting to use TRAINSET, a web based graphical tool for labeling time series. You can read more about TRAINSET on www.trainset.geocene.com and on their Github.
The tool interface is quite intuitive. Start by uploading your .csv file (e.g., 18IC_act_pres.csv).

View after uploading a file
A few tips:
-
Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labeling (add/remove a label), specifically with
SHIFT. - Because of the large number of datapoints, keeping a narrow temporal window will avoid your browser from becoming slow or irresponsive.
- You can change the “Reference Series” to pressure to see both timeseries at the same time which is helps interpret what the bird is doing.
- Play with the y-axis range to properly see small pressure variations which may not be visible at full range.
- TRAINSET is offers more flexibility with the label than required: you can add and remove label values (bottom-right of the page). In order for
trainset_read()to work, do not change/edit/add any label, simply use the ones offered :0and1.
Four tests to check labeling
To assess the quality of your labeling, you can use this script comprising of four basic tests.
Test 1: Duration of stationary periods and flights
The first test consits in checking the durations of flights and stationary periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v1.csv")
pam_data = pam_sta(pam_data)
knitr::kable(pam_data$sta[difftime(pam_data$sta$end,pam_data$sta$start, units = "mins")<60 | pam_data$sta$next_flight_duration<30,])| start | end | duration | next_flight_duration | sta_id | |
|---|---|---|---|---|---|
| 7 | 2017-08-30 23:45:00 | 2017-08-30 23:55:00 | 10 mins | 255 mins | 7 |
| 27 | 2018-04-15 19:30:00 | 2018-04-15 20:10:00 | 40 mins | 85 mins | 27 |
| 30 | 2018-04-29 23:35:00 | 2018-04-29 23:45:00 | 10 mins | 170 mins | 30 |
| 32 | 2018-04-30 19:20:00 | 2018-04-30 19:40:00 | 20 mins | 125 mins | 32 |
| 33 | 2018-04-30 21:45:00 | 2018-04-30 21:55:00 | 10 mins | 65 mins | 33 |
| 34 | 2018-04-30 23:00:00 | 2018-04-30 23:10:00 | 10 mins | 50 mins | 34 |
| 35 | 2018-05-01 00:00:00 | 2018-05-01 00:10:00 | 10 mins | 35 mins | 35 |
| 36 | 2018-05-01 00:45:00 | 2018-05-01 23:30:00 | 1365 mins | 0 mins | 36 |
You may want to check labeling of flights shorter than a 1 hours and labeling before and after stationary periods shorter than a couple of hours. Using the exact times from the table above, you can edit your labeling in TRAINSET and export a new version of the csv file. Note that the last row has a next_flight_duration of 0 because it is the last stationary period.
Test 2: Pressure timeseries
The second check to carry out before computing the map is to visualize the pressure timeseries and their grouping into stationary periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v2.csv")
pam_data = pam_sta(pam_data)
p <- ggplot() +
geom_line(data = pam_data$pressure, aes(x=date,y=obs),col="grey") +
geom_line(data = subset(pam_data$pressure, sta_id != 0),
aes(x=date,y=obs,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
#scale_colour_brewer(type="qualitative", palette = 'Set1')
ggplotly(p, dynamicTicks = T) %>%
layout(showlegend=F,
legend = list(orientation = "h", x = -0.5),
yaxis = list(title="Pressure [hPa]"))Ploting this figure with PlotlyR allows you to zoom-in and pan to check all timeseries are correctly grouped. Make sure each stationary period does not include any pressure measurement from flight (e.g. 1-Sep-2019). You might spot some anomalies in the temporal variation of pressure. In some cases, you can already label the pressure timeseries to remove them.
Test 3: Pressure timeserie match
So far we have checked that the pressure timeseries are correctly labeled with their respective stationary periods, and that they look relatively smooth. At this stage, the timeserie are good enough to be match with the reanalyis data. The third test consists of finding the location with the best match and comparing the pressure timeseries. This allows to distinguish bird movements from natural variations of the pressure. This is by far the more difficult step and multiple iteration will be necessary to get the best result.
As computation can takes some time, we recommend starting with a few long stationary periods, and once results are satisfying, moving to the shorter periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v3.csv")
pam_data = pam_sta(pam_data)
sta_id_keep = pam_data$sta$sta_id[difftime(pam_data$sta$end,pam_data$sta$start, units = "hours")>12]
pam_data$pressure$sta_id[!(pam_data$pressure$sta_id %in% sta_id_keep)] = NA
message("Number of stationay period to query: ",length(sta_id_keep))We can estimate the probability map for each stationary periods
raster_list = geopressure_map(pam_data$pressure, extent=c(-16,20,0,50), scale=10, max_sample=100)
prob_map_list = geopressure_prob_map(raster_list)For each stationary period, we locate the best match and query the pressure timeseries with geopressure_ts() at this location. If you get errors, check the probability map and the best match (see commented line starting with leadlet())
ts_list=list()
for (i_r in 1:length(prob_map_list)){
i_s = metadata(prob_map_list[[i_r]])$sta_id
# find the max value of probability
tmp = as.data.frame(prob_map_list[[i_r]],xy=T)
lon = tmp$x[which.max(tmp[,3])]
lat = tmp$y[which.max(tmp[,3])]
# Visual check
# leaflet() %>% addTiles() %>% addRasterImage(prob_map_list[[i_r]]) %>% addMarkers(lat=lat,lng=lon)
# query the pressure at this location
message("query:",i_r,"/",length(prob_map_list))
ts_list[[i_r]] = geopressure_ts(lon,
lat,
pressure = subset(pam_data$pressure,sta_id==1))
# Add sta_id
ts_list[[i_r]]['sta_id'] = i_s
# Remove mean
ts_list[[i_r]]$pressure0 = ts_list[[i_r]]$pressure - mean(ts_list[[i_r]]$pressure) + mean(pam_data$pressure$obs[id])
}We can now look at a similar figure of pressure timeseries, but this time comparing the geolocator data with the best match from the reanalysis data.
p <- ggplot() +
geom_line(data=pam_data$pressure, aes(x=date,y=obs), colour="grey") +
geom_point(data=subset(pam_data$pressure, class), aes(x=date,y=obs), colour="black") +
geom_line(data=do.call("rbind", ts_list), aes(x=date,y=pressure0,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
ggplotly(p, dynamicTicks = T) %>%
layout(showlegend=F,
legend = list(orientation = "h", x = -0.5),
yaxis = list(title="Pressure [hPa]"))You can use this figure to identify periods where the mismatch indicates a problem with the labeling. Often, it will indicates that the bird was moving altitude. This happens regularly on migration where the bird land in one location and performs a one or two short flight in the morning, changing altitude. Activity data on TRAINSET can also help understanding what the bird is doing.
Test 4: Histogram of pressure error
Finally, you can also look at the histogram of the pressure error (geolocator-ERA5). For long stationary period (~>5 days), you want to checkt that there is a singlemode in your distribution. Two modes indicates that the bird is spending time at two different altitude. This is usual when bird have a day site and a night roost at different elevation. You might also want to notice the spread of the distribution. This value can guide you in setting the standard deviation parameter s in geopressure_prob_map().
par(mfrow = c(5,6), mar=c(1,1,3,1))
for (i_r in seq_along(ts_list)){
i_s = unique(ts_list[[i_r]]$sta_id)
df3 <- merge(ts_list[[i_r]], subset(pam_data$pressure, !class & sta_id==i_s), by = "date")
df3$error = df3$pressure0-df3$obs
hist(df3$error, main = i_s, xlab="", ylab="")
abline(v = 0, col="red")
}
Common challenges and tips to address them
In the following section, we use examples to illustrate common challenges that may be encountered during manual editing, and suggestions on how to address them.
Outliars during flights due to low bird activity
Birds can have periods of low activity during their flight (e.g., less flapping). In those cases, the automatic labeling of activity with the KNN classifier mislabels these points as stationary periods, as illustrated in this example below for the night of the 31st of August. A single mislabeled point can incorrectly split the flight into multiple short flights. This error will be highlighted by test #1 described above. However, birds might also display lower activity at the beginning or end of their flight, which is often mis-classified, as illustrated in all three nights in the figure below. Test #1 would not be able to pick up on these.

However, if the low activity happens well before the bird reaches the ground, as illustrated in the example below, it is visible in the figure generated by test #2. Yet this is not always the case, we must therefore assess on a case-by-case basis whether this should be included in the flight or not.

Importance of zooming in before editing outliers
Anomalies in a pressure timeseries might not be obvious at first sight.

By zooming in to narrower pressure range, it is easier to understand what is happening. In this example, we have a Tawny Pipit breeding near a mine site with accidental topography. While breeding, it looks like it is staying at a relatively constant elevation, but the sudden drop in pressure towards the end indicates that the bird has changed altitude.

In these cases, the aim is to label all pressure datapoints recorded while the bird was at a different altitude. It may not always be obvious to distinguish temporal variation of pressure from when the bird actually changes altitude. In such cases, we suggest keeping only the datapoints that you are confident with (in this case the first part of the timeseries only) and running test #3.

With a long time serie as this one, it will easily pick up the right location and the timeseries that you want to match. There you simply have to de-label de datapoint at the end of your timeseries that fit the ERA5 green line. For shorter timeserie, you might need several iteration to pick up the correct match.
Short stationary halts between flights
Interpreting bird behaviour and defining stationary periods can be difficult, for example when birds extend their migration into the day but with loyer intensity so that there is no clear end.

In other case, the bird stop for a couple of hour and then seem to be active afaint. Could be low-intensity migratory movement, a short break followed by more migratory flight, or landing at the stopover location, but relocating early morning with the light

The question is whether to label these breaks as stationary periods or not.
Referring to the pressure timeseries can help assess whether the bird changes location. For example, if the low activity is followed by high activity accompanied by pressure change, we can consider that the bird then changed location, and label the low activity as a stationary period.
However, the bird may also land and then complete local flights within its stopover location (with very little pressure variation), in which case we do not want to create two different stationary periods.
Test 3 will be essential to insure that no local vertical movement happened. Use the reanalysis data to find the best match.

Mountainous species
Mountainous species will display very specific behaviour with regular altitudinal changes.
This is very clear with the Ring Ouzel’s timeseries, with daily recuring movement, but no regular enough to make the process automatic and sometimes changing altitude. Choose which datapoint to keep and those to discard might not always be easy. Both the 790hPa and 900hPa might work.
It’s often a good idea to zoom back on the time axis to see if a certain elevation seems more commonly used. Then proceed similaryl to the Tawny pipit case with an iterative manner to keep only the datapoint at the sams elevation. Test 4 is often quite useful to make sur you haven’t forgotten some datatpoint


The Eurasian Hoopoe is a bit more difficult because it’s moving more continously through the day showing a more sinosoidal pattern.
This is the most difficult case as you really can’t distinguisth temporal varation from altitude.
After some iteration, you’ll end-up with something relatively correct. Note that to estimate the uncertainty correctly for such case, you will have to increase the standard deviation s. Howver, this behaviour is luckyly restricted to its breeding ground.

In some case, find a single timeserie is too difficult. This is the case for the wintering site of this Ring Ouzel, never returning to the same elevation. In such case, you can discard the entire timeserie and only use the mask of absolute value of pressure.

Luckily, mountainous species lives in rather narrow area (moutain), and in this case, it it easy from previous stationay period that it was in Maroc, and with so low pressure (high elevation), only the atlas mountain fits the criteria of threashold.

Future improvements
A lot can be do to improve this process:
- Run trainset offline.
- By-pass the create csv, uplad csv, read csv by runing a browser session directly in R
- Building a R (shiny) equivalent of Trainset to be directly integrated with the R package. Problem: can’t find a good package to label point in a figure in R, would have to maintain it while trainset it doing that for free.
- Any suggestions? Write an issue on Gitub